September 29, 2016

Lecture 7 Main Points Once Again

  • Marginal probabilities. Compute marginals of variables (given model parameters \(\mathbf{\theta}\)): \(p(x_i\mid \mathbf{\theta})=\sum_{\mathbf{x}': x_i'=x_i}p(\mathbf{x}'\mid \mathbf{\theta}).\) (or posterior distribution, aka, query probabilities)

  • Technique: Variable elimination to avoid the computational complexity that is exponential in dimension

  • Why it works
    • Use the fact that some factors only involve a small number of variables
    • By computing intermediate factors and caching the results, we avoid duplicated calculations
  • Q: What if we calculated a particular query probability \(p(x_1\mid x_6)\), and now we want to calculate \(p(x_4\mid x_3)\)? How to share the work across them.

  • A: This motivates the study of more sophisticated graph representation methods, including factor graphs and tree representation of UG.

Example of variable elimination: \(p(x_1\mid x_6)\)

Factor Graphs

  • We've shown how to use variable elimination to efficiently compute marginal probabilities on DAGs
  • We now extend to undirected graphs using factor graphs that generalize both DAG and UG.
  • DAG and UG focus on capturing the conditional independencies in the probability distribution through (d-)separation relations.
  • Factor graphs aim to capture the factorization scheme of a multiriviate joint distribution. (That is we focus on the exponential representation of a distribution)

Examples: Factor Graphs

  • The factor graph representation is not unique if only given the graph structure.

Sum-Product Algorithms for Factor Trees

  • Tree-structured factor graph: the undirected graph obtained by ignoring the distinction between variable nodes and factor nodes is a tree.
  • On these graphs, exact inference is possible.

Sum-Product Algorithm for Factor Trees

Example: Sum-Product Algorithm

  • calculate the marginal distribution \(p(x_1)\) \[p(x_1)=\sum_{x_2}\cdots\sum_{x_6}p(\mathbf{x})\]

Extensions

  • What if the factor graph is not a tree: junction-tree algorithm that cluster maximal cliques into supernodes which then form a tree
  • What if we care about the configuration with the maximum probability: Max-Product or Max-Sum algorithm

Comment

  • Next Lecture: Belief propagation demonstration on Hidden Markov Models
    • Forward-backward algorithm (special case of Sum-Product algorithm)
    • Viterbi algorithm (special case of Max-Product or Max-Sum algorithm)
  • Optional reading:
    • Yedidia, J.S., Freeman, W.T. and Weiss, Y., 2003. Understanding belief propagation and its generalizations. Exploring artificial intelligence in the new millennium, 8, pp.236-239.